Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 725 #731

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Issue 725 #731

wants to merge 3 commits into from

Conversation

GawyWOOOHOOO
Copy link

  1. Adding a flag to track whether any process was parallelized
  2. Track execution time for each process
  3. Modify Schedule.run() to generate feedback messages based on the flag and execution time status.

Copy link

OS =
CPU =
Ram =
Hash = 71dc3ea
Kernel=
||
|-|-|-|-|-|-|-|-|-|

Copy link

OS:ubuntu-20.04
Sat Nov 16 22:21:02 UTC 2024
intro: 2/2 tests passed.
interface: 41/41 tests passed.
compiler: 54/54 tests passed.

compiler/pash.py Outdated
@@ -35,6 +35,9 @@ def main():
return_code = preprocess_and_execute_asts(input_script_path, args, input_script_arguments, shell_name)

log("-" * 40) #log end marker

if args.debug >= 1:
log("Use the '-d 1' option for detailed debugging information.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure what this is trying to do here.

@@ -295,6 +295,11 @@ def compile_and_add(self, compiled_script_file, var_file, input_ir_file):
pass
else:
self.running_procs += 1

if ast_or_ir is not None:
compile_success = True
Copy link
Member

@angelhof angelhof Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this variable set here? Isn't is set previously?

Copy link
Member

@angelhof angelhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have left some comments with questions and here are some more needed changes to get this merged:

  1. Rebase against binpash:future which is the branch of PaSh where we make all new changes.
  2. Add a short example usage of this (for example with an echo hi script and with a cat README.md | grep "foo" script in docs/tutorial/tutorial.md
  3. Add a test to check that this behavior happens (for echo hi and cat README.md | grep "foo"). We need to create a new test category in this script (https://github.com/binpash/pash/blob/future/scripts/run_tests.sh) that we can call api_tests. And the test should check that if pash is invoked on these two scripts, its standard error contains these two messages.


if ast_or_ir is not None:
compile_success = True
if run_parallel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not indicate whether a fragment of the script was parallelized successfully. compile_success checks whether a script region was compiled successfully (which means that it was successfully translated into a dataflow graph, which means that we have annotations for all commands in it and the annotations for all commands are pure, parallelizable pure, or stateless). However, we need to also check if there was any parallelization transformation applied (which has to be kept as state and checked further in the compiler (see this function: https://github.com/binpash/pash/blob/future/compiler/pash_compiler.py#L227).

@@ -336,9 +341,19 @@ def handle_exit(self, input_cmd):
## Get the execution time
command_finish_exec_time = datetime.now()
command_start_exec_time = self.process_id_input_ir_map[process_id].get_start_exec_time()
exec_time = (command_finish_exec_time - command_start_exec_time) / timedelta(milliseconds=1)
exec_time = (command_finish_exec_time - command_start_exec_time).total_seconds()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed?

log("Process:", process_id, "exited. Exec time was:", exec_time)
self.handle_time_measurement(process_id, exec_time)

proc_info = self.process_id_input_ir_map[process_id]
if proc_info.compiler_config.width > 1: # Check if it was parallelized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check if parallelization was successful, but just whether the compiler would even try to parallelize.

@@ -92,6 +92,8 @@ def compile_ir(ir_filename, compiled_script_file, args, compiler_config):
ret = None
try:
ret = compile_optimize_output_script(ir_filename, compiled_script_file, args, compiler_config)
if ret is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this means that there were no parallelization opportunities for the whole script, but only for this region. Also, I think it might be subsumed by the exception handling. Is this code ever called?

@@ -414,6 +429,16 @@ def run(self):
self.parse_and_run_cmd(input_cmd)

self.connection_manager.close()
if not self.parallelized_flag:
log("No parts of the input script were parallelized. Ensure commands are annotated for parallelization.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages need to be logged no matter the debug level, so they should be given a different level (the way it is now they will only be printed if we have -d 1. Also it would be good to prefix them with [PaSh Warning]. Here is some wordsmithing to make them a bit clearer too:

  • [PaSh Warning] No region of the script was parallelized. Maybe you are missing relevant annotations? Use -d 1 for more info.
  • [PaSh Warning] Some script regions were parallelized but their execution times were negligible (<1s). If your script takes a long time maybe annotations are missing from relevant regions. Use -d 1 for more info.

if not self.parallelized_flag:
log("No parts of the input script were parallelized. Ensure commands are annotated for parallelization.")
elif all(
proc_info.exec_time is not None and proc_info.exec_time < 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked that this ever actually passes? I am skeptical about the all proc_info.exec_time is not None. Is there anyway one of those is None and this becomes false?

for proc_info in self.process_id_input_ir_map.values()
):
log("Some script fragments were parallelized, but their execution times were negligible.")
log("Consider optimizing your script to include longer-running tasks.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this message, this is not really an optimization that we are asking them to do, but rather making sure that they have annotations for the long-running parts that they care about.

log("Some script fragments were parallelized, but their execution times were negligible.")
log("Consider optimizing your script to include longer-running tasks.")
else:
log("Parallelization completed successfully.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, should be deleted.

Copy link

OS =
CPU =
Ram =
Hash = ba3d0c0
Kernel=
||
|-|-|-|-|-|-|-|-|-|

Copy link

OS:ubuntu-20.04
Fri Nov 29 23:05:55 UTC 2024
intro: 2/2 tests passed.
interface: 41/41 tests passed.
compiler: 54/54 tests passed.

Copy link

OS:ubuntu-20.04
Fri Nov 29 23:37:50 UTC 2024
intro: 0/2 tests passed.
interface: 8/41 tests passed.
compiler: 38/54 tests passed.
demo-spell.sh are not identical
hello-world.sh are not identical
test1 are not identical
test2 are not identical
test3 are not identical
test4 are not identical
test5 are not identical
test6 are not identical
test8 are not identical
test9 are not identical
test10 are not identical
test12 are not identical
test13 are not identical
test14 are not identical
test15 are not identical
test16 are not identical
test17 are not identical
test18 are not identical
test_set are not identical
test_set_e are not identical
test_redirect are not identical
test_unparsing are not identical
test_set_e_2 are not identical
test_set_e_3 are not identical
test_new_line_in_var are not identical
test_cmd_sbst are not identical
test_cmd_sbst2 are not identical
test_trap are not identical
test_umask are not identical
test_var_assgn_default are not identical
test_exclam are not identical
test_redir_var_test are not identical
test_star are not identical
test_env_vars are not identical
test_redir_dup are not identical
diff.sh are not identical
diff.sh are not identical
set-diff.sh are not identical
set-diff.sh are not identical
export_var_script.sh are not identical
export_var_script.sh are not identical
comm-par-test.sh are not identical
comm-par-test.sh are not identical
comm-par-test2.sh are not identical
comm-par-test2.sh are not identical
tee_web_index_bug.sh are not identical
tee_web_index_bug.sh are not identical
fun-def.sh are not identical
fun-def.sh are not identical
bigrams.sh are not identical
bigrams.sh are not identical

Copy link

OS =
CPU =
Ram =
Hash = 4896a16
Kernel=
||
|-|-|-|-|-|-|-|-|-|

Copy link

OS:ubuntu-20.04
Fri Nov 29 23:55:40 UTC 2024
intro: 0/2 tests passed.
interface: 8/41 tests passed.
compiler: 38/54 tests passed.
demo-spell.sh are not identical
hello-world.sh are not identical
test1 are not identical
test2 are not identical
test3 are not identical
test4 are not identical
test5 are not identical
test6 are not identical
test8 are not identical
test9 are not identical
test10 are not identical
test12 are not identical
test13 are not identical
test14 are not identical
test15 are not identical
test16 are not identical
test17 are not identical
test18 are not identical
test_set are not identical
test_set_e are not identical
test_redirect are not identical
test_unparsing are not identical
test_set_e_2 are not identical
test_set_e_3 are not identical
test_new_line_in_var are not identical
test_cmd_sbst are not identical
test_cmd_sbst2 are not identical
test_trap are not identical
test_umask are not identical
test_var_assgn_default are not identical
test_exclam are not identical
test_redir_var_test are not identical
test_star are not identical
test_env_vars are not identical
test_redir_dup are not identical
diff.sh are not identical
diff.sh are not identical
set-diff.sh are not identical
set-diff.sh are not identical
export_var_script.sh are not identical
export_var_script.sh are not identical
comm-par-test.sh are not identical
comm-par-test.sh are not identical
comm-par-test2.sh are not identical
comm-par-test2.sh are not identical
tee_web_index_bug.sh are not identical
tee_web_index_bug.sh are not identical
fun-def.sh are not identical
fun-def.sh are not identical
bigrams.sh are not identical
bigrams.sh are not identical

Copy link

OS =
CPU =
Ram =
Hash = c4808ee
Kernel=
||
|-|-|-|-|-|-|-|-|-|

@angelhof
Copy link
Member

This needs to be rebased for the future branch BTW :)

2. Track execution time for each process
3. Modify Schedule.run() to generate feedback messages based on the flag and execution time status.
Copy link

github-actions bot commented Dec 5, 2024

OS =
CPU =
Ram =
Hash = 62864a4
Kernel=
||
|-|-|-|-|-|-|-|-|-|

Copy link

github-actions bot commented Dec 5, 2024

OS:ubuntu-20.04
Thu Dec 5 19:13:31 UTC 2024
intro: 2/2 tests passed.
interface: 42/42 tests passed.
compiler: 54/54 tests passed.

Copy link

github-actions bot commented Dec 5, 2024

OS =
CPU =
Ram =
Hash = 6aa88d8
Kernel=
||
|-|-|-|-|-|-|-|-|-|

Copy link

github-actions bot commented Dec 5, 2024

OS:ubuntu-20.04
Thu Dec 5 19:51:04 UTC 2024
intro: 2/2 tests passed.
interface: 42/42 tests passed.
compiler: 18/54 tests passed.
grep.sh are not identical
grep.sh are not identical
minimal_sort.sh are not identical
minimal_sort.sh are not identical
minimal_grep.sh are not identical
minimal_grep.sh are not identical
topn.sh are not identical
topn.sh are not identical
wf.sh are not identical
wf.sh are not identical
spell.sh are not identical
spell.sh are not identical
shortest_scripts.sh are not identical
shortest_scripts.sh are not identical
alt_bigrams.sh are not identical
alt_bigrams.sh are not identical
deadlock_test.sh are not identical
deadlock_test.sh are not identical
double_sort.sh are not identical
double_sort.sh are not identical
no_in_script.sh are not identical
no_in_script.sh are not identical
for_loop_simple.sh are not identical
for_loop_simple.sh are not identical
minimal_grep_stdin.sh are not identical
minimal_grep_stdin.sh are not identical
micro_10.sh are not identical
micro_10.sh are not identical
sed-test.sh are not identical
sed-test.sh are not identical
tr-test.sh are not identical
tr-test.sh are not identical
grep-test.sh are not identical
grep-test.sh are not identical
ann-agg.sh are not identical
ann-agg.sh are not identical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants